Using Random Forest for Protein Fold Prediction Problem: An Empirical Study

نویسندگان

  • Abdollah Dehzangi
  • Somnuk Phon-Amnuaisuk
  • Omid Dehzangi
چکیده

The functioning of a protein in biological reactions crucially depends on its threedimensional structure. Prediction of the three-dimensional structure of a protein (tertiary structure) from its amino acid sequence (primary structure) is considered as a challenging task for bioinformatics and molecular biology. Recently, due to tremendous advances in the pattern recognition field, there has been a growing interest in applying classification approaches to tackle the protein fold prediction problem. In this paper, Random Forest, as a kind of ensemble method, is employed to address this problem. The Random Forest, is a recently introduced method based on bagging algorithm that trains a group of base classifiers by randomly selecting sets of features and then, combining results obtained from base classifiers by majority voting. To investigate the effectiveness of the number of base learners to the performance of the Random Forest, twelve different numbers of base classifiers (between 30 and 600) are applied for this classifier. To study the performance of the Random Forest and compare its results with previously reported results, the dataset produced by Ding and Dubchak is used. Our experimental results show that the Random Forest enhances the prediction accuracy (using same set of features proposed by Dubchak et al.) as well as reduces time consumption of the protein fold prediction task, compared to the previous works found in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Protein Structure ( RMSD < = 6 A ̊ ) using Physicochemical Properties

The quality of the protein structure can be determined by physical and chemical properties, therefore it has been used to distinguish native or native like structure from other predicted structures. In this study, the machine learning classification models are explored with six physical and chemical properties to classify the root mean square deviation (RMSD) of the protein structure in absence...

متن کامل

An Optimal Model for Medicine Preparation Using Data Mining

Introduction: Lack of financial resources and liquidity are the main problems of hospitals. Pharmacies are one of the sectors that affect the turnover of hospitals and due to lack of forecast for the use and supply of medicines, at the end of the year, encounter over-inventory, large volumes of expired medicines, and sometimes shortage of medicines. Therefore, medicine prediction using availabl...

متن کامل

Prediction of protein-mannose binding sites using random forest

Mannose is an abundant cell surface monosaccharide and has an important role in many biochemical processes. It binds to a great diversity of receptor proteins. In this study we have employed Random Forest for prediction of mannose binding sites. Mannosebinding site is taken to be a sphere around the centroid of the ligand and the sphere is subdivided into different layers and atom wise and resi...

متن کامل

An Optimal Model for Medicine Preparation Using Data Mining

Introduction: Lack of financial resources and liquidity are the main problems of hospitals. Pharmacies are one of the sectors that affect the turnover of hospitals and due to lack of forecast for the use and supply of medicines, at the end of the year, encounter over-inventory, large volumes of expired medicines, and sometimes shortage of medicines. Therefore, medicine prediction using availabl...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2010